Purpose: Hypothesis Testing¶

Instructions¶

1. Two-sample hypothesis testing¶

Perform the stated hypothesis test for each batch of experimental data.

  • A. Use a left-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically smaller than sample #2 (S2). What happens if you reverse the order of the samples in the test (S2<S1)? Also report the mean and standard deviation of each sample. experiment_1.csv

  • B. Use a left-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically smaller than sample #2 (S2). What happens if you reverse the order of the samples in the test (S2<S1)? Also report the mean and standard deviation of each sample. experiment_2.csv

  • C. Use a right-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically larger than sample #2 (S2).What happens if you reverse the order of the samples in the test (S2>S1)? Also report the mean and standard deviation of each sample. experiment_3.csv

  • D. Use a right-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically larger than sample #2 (S2).What happens if you reverse the order of the samples in the test (S2>S1)? Also report the mean and standard deviation of each sample. experiment_4.csv

  • E. Use a two-tailed t-test to evaluate whether or not there is a statistical difference between the following samples. Also report the mean and standard deviation of each sample. experiment_4-1.csv

  • F. Use a two-tailed t-test to evaluate whetheror not there is a statistical difference between the following samples. Also report the mean and standard deviation ofeach sample. experiment_6.csv

importing Neccessary Libraries¶

In [1]:
import pandas as pd
import numpy as np
import scipy
from scipy import stats

Loading Datasets into Pandas Dataframe¶

In [2]:
df1 = pd.read_csv('experiment_1.csv')
df2 = pd.read_csv('experiment_2.csv')
df3 = pd.read_csv('experiment_3.csv')
df4 = pd.read_csv('experiment_4.csv')
df6 = pd.read_csv('experiment_6.csv')

A. Use a left-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically smaller than sample #2 (S2). What happens if you reverse the order of the samples in the test (S2<S1)? Also report the mean and standard deviation of each sample.¶

In [3]:
df1.head()
Out[3]:
S1 S2
0 24.35 25.20
1 24.75 24.45
2 24.10 25.10
3 23.70 24.75
4 24.45 25.65
In [4]:
tset, pval = stats.ttest_ind(df1['S1'], df1['S2'], alternative='less')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is smaller than S2")
else:
  print("S1 is not smaller than S2")
p-values 0.012180111249740139
S1 is smaller than S2
In [5]:
tset, pval = stats.ttest_ind(df1['S2'], df1['S1'], alternative='less')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S2 is smaller than S1")
else:
  print("S2 is not smaller than S1")
p-values 0.9878198887502598
S2 is not smaller than S1
In [6]:
df1_mean_std = df1.agg({'S1': ['mean','std'],
              'S2': ['mean','std']}).T
df1_mean_std
Out[6]:
mean std
S1 24.415 0.385177
S2 24.950 0.570575

B. Use a left-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically smaller than sample #2 (S2). What happens if you reverse the order of the samples in the test (S2<S1)? Also report the mean and standard deviation of each sample.¶

In [7]:
df2.head()
Out[7]:
S1 S2
0 24.35 25.20
1 24.75 24.45
2 24.10 25.10
3 23.70 24.75
4 24.45 25.65
In [8]:
tset, pval = stats.ttest_ind(df2['S1'], df2['S2'], alternative='less')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is smaller than S2")
else:
  print("S1 is not smaller than S2")
p-values 0.012180111249740139
S1 is smaller than S2
In [9]:
tset, pval = stats.ttest_ind(df2['S2'], df2['S1'], alternative='less')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S2 is smaller than S1")
else:
  print("S2 is not smaller than S1")
p-values 0.9878198887502598
S2 is not smaller than S1
In [10]:
df2_mean_std = df2.agg({'S1': ['mean','std'],
              'S2': ['mean','std']}).T
df2_mean_std
Out[10]:
mean std
S1 24.415 0.385177
S2 24.950 0.570575

C. Use a right-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically larger than sample #2 (S2).What happens if you reverse the order of the samples in the test (S2>S1)? Also report the mean and standard deviation of each sample.¶

In [11]:
df3.head()
Out[11]:
S1 S2
0 25.20 25.65
1 24.45 24.75
2 25.10 25.15
3 24.75 24.85
4 25.65 25.00
In [12]:
tset, pval = stats.ttest_ind(df3['S1'], df3['S2'], alternative='greater')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is larger than S2")
else:
  print("S1 is not larger than S2")
p-values 0.7395693253906392
S1 is not larger than S2
In [13]:
tset, pval = stats.ttest_ind(df3['S2'], df3['S1'], alternative='greater')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S2 is larger than S1")
else:
  print("S2 is not larger than S1")
p-values 0.26043067460936076
S2 is not larger than S1
In [14]:
df3_mean_std = df3.agg({'S1': ['mean','std'],
                        'S2': ['mean','std']}).T
df3_mean_std
Out[14]:
mean std
S1 24.950 0.570575
S2 25.085 0.315392

D. Use a right-tailed t-test to evaluate whether or not sample #1 (Sl) is statistically larger than sample #2 (S2).What happens if you reverse the order of the samples in the test (S2>S1)? Also report the mean and standard deviation ofeach sample.¶

In [15]:
df4.head()
Out[15]:
S1 S2
0 25.20 24.35
1 24.45 24.75
2 25.10 24.10
3 24.75 23.70
4 25.65 24.45
In [16]:
tset, pval = stats.ttest_ind(df4['S1'], df4['S2'], alternative='greater')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is larger than S2")
else:
  print("S1 is not larger than S2")
p-values 0.012180111249740139
S1 is larger than S2
In [17]:
tset, pval = stats.ttest_ind(df4['S2'], df4['S1'], alternative='greater')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S2 is larger than S1")
else:
  print("S2 is not larger than S1")
p-values 0.9878198887502598
S2 is not larger than S1
In [18]:
df4_mean_std = df4.agg({'S1': ['mean','std'],
                        'S2': ['mean','std']}).T
df4_mean_std
Out[18]:
mean std
S1 24.950 0.570575
S2 24.415 0.385177

E. Use a two-tailed t-test to evaluate whether or not there is a statistical difference between the following samples. Also report the mean and standard deviation of each sample. experiment_4-1.csv¶

In [19]:
df4.head()
Out[19]:
S1 S2
0 25.20 24.35
1 24.45 24.75
2 25.10 24.10
3 24.75 23.70
4 25.65 24.45
In [20]:
tset, pval = stats.ttest_ind(df4['S1'], df4['S2'], alternative='two-sided')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is different from S2")
else:
  print("S1 is not different from S2")
p-values 0.024360222499480277
S1 is different from S2
In [21]:
tset, pval = stats.ttest_ind(df4['S2'], df4['S1'], alternative='two-sided')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is different from S2")
else:
  print("S1 is not different from S2")
p-values 0.024360222499480277
S1 is different from S2
In [22]:
df4_mean_std = df4.agg({'S1': ['mean','std'],
                        'S2': ['mean','std']}).T
df4_mean_std
Out[22]:
mean std
S1 24.950 0.570575
S2 24.415 0.385177

F. Use a two-tailed t-test to evaluate whetheror not there is a statistical difference between the following samples. Alsoreport the mean and standard deviation ofeach sample. experiment_6.csv¶

In [23]:
df6.head()
Out[23]:
S1 S2
0 24.35 25.20
1 24.75 24.45
2 24.10 25.10
3 23.70 24.75
4 24.45 25.65
In [24]:
tset, pval = stats.ttest_ind(df6['S1'], df6['S2'], alternative='two-sided')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is different from S2")
else:
  print("S1 is not different from S2")
p-values 0.024360222499480277
S1 is different from S2
In [25]:
tset, pval = stats.ttest_ind(df6['S2'], df6['S1'], alternative='two-sided')
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print("S1 is not different from S2")
else:
  print("S1 is different from S2")
p-values 0.024360222499480277
S1 is not different from S2
In [26]:
df6_mean_std = df6.agg({'S1': ['mean','std'],
                        'S2': ['mean','std']}).T
df6_mean_std
Out[26]:
mean std
S1 24.415 0.385177
S2 24.950 0.570575